04. Fruit Game Sample
Fruit
As with the previous PyTorch Cartpole example, the agent is learning "from vision" to translate the raw pixel array into actions using DQN. In the 2D Graphical example, the agent appears at random locations and must find the "fruit" object to gain the reward and win episodes before running out of bounds or the timeout period expires.The agent has 5 possible actions to choose from: up, down, left, right, or none on the screen in order to navigate to the object.
fruit
Implementation
The fruit
code looks very similar to the catch
code. It’s important to note that the same agent class is used in both environments!
// Create reinforcement learner agent in pyTorch
dqnAgent* agent = dqnAgent::Create(gameWidth, gameHeight,
NUM_CHANNELS, NUM_ACTIONS, OPTIMIZER,
LEARNING_RATE, REPLAY_MEMORY, BATCH_SIZE,
GAMMA, EPS_START, EPS_END, EPS_DECAY,
USE_LSTM, LSTM_SIZE, ALLOW_RANDOM, DEBUG_DQN);
The parameter values are slightly different (the frame size and number of channels have changed), but the algorithm for training the network to produce actions from inputs remains the same.
The environment is more complicated for fruit
than it is for catch
, so it has been extracted to the fruitEnv.cpp
module and it’s own class, FruitEnv
. The environment object named fruit
is instantiated in the fruit.cpp
module.
// Create Fruit environment
FruitEnv* fruit = FruitEnv::Create(gameWidth, gameHeight, epMaxFrames);
We can trace the handoff between the agent and environment through the following code snippet located in the main game loop in the fruit.cpp
module:
// Ask the agent for their action
int action = 0;
if( !agent->NextAction(input_tensor, &action) )
printf("[deepRL] agent->NextAction() failed.\n");
if( action < 0 || action >= NUM_ACTIONS )
action = ACTION_NONE;
// Provide the agent's action to the environment
const bool end_episode = fruit->Action((AgentAction)action, &reward);
In this snippet, action
is the variable that contains the agent
object’s next action, based on the previous environment state represented by the input_tensor
variable. The reward
is determined in the last line when the action
is submitted to the environment object named fruit
.
Quiz - Fruit Rewards
The fruit
rewards function can be implemented a number of different ways. Below are several possible reward functions for the game that compare previous and current distances between the agent and its goal. Match each to descriptions in the quiz below.
A
*reward = (lastDistanceSq > fruitDistSq) ? 1.0f : 0.0f;
B
*reward = (sqrtf(lastDistanceSq) - sqrtf(fruitDistSq)) * 0.5f;
C
*reward = (sqrtf(lastDistanceSq) - sqrtf(fruitDistSq)) * 0.33f;
D
*reward = exp(-(fruitDistSq/worldWidth/1.5f));
QUIZ QUESTION::
Match each description to the reward functions shown above.
ANSWER CHOICES:
Description |
Code snippet label |
---|---|
B |
|
D |
|
A |
|
C |
SOLUTION:
Description |
Code snippet label |
---|---|
B |
|
D |
|
A |
|
C |
Running Fruit
To test the fruit sample, open the desktop in the “Test the API” workspace, open a terminal, and once again navigate to the build directory with
cd /home/workspace/RoboND-DeepRL-Project/build
Launch the following executable from the terminal in the build directory:
$ cd x86_64/bin
$ ./fruit
It should achieve 85% accuracy after around ~100 episodes within the default 48x48 environment:
Alternate Arguments
Optional command line parameter examples for fruit
can be used to change the size of the pixel array and limit the number of frames:
$ ./fruit --width=64 --height=64 --episode_max_frames=100